US Senate: loyalty and relationship through the first session of the 117th Congress
INTRODUCTION
In this project I will analyze the US Senate through two research questions:
In the first question, I research the possible connection between the seniority of the members of the upper chamber of the Congress and their position with respect to their Party Leader. I will use each cast vote and recode these looking at the vote expressed by their Leader, starting from considering the equal political division of the US Senate with just the Vice president casting the tie-breaking vote. For this reason, I want to observe if the seniority could influence how senators cast their vote, considering that they used to vote quite more similar to their leader, while the juniors will vote more differently. (2008)
In the second question, I will start from what is now considered as a general opinion about US politics: it is polarized, and the Senate is the place where it is more evident. Starting from this statement, I will use each introduced bill in the first session of the 117th Congress, the one that goes through 2021, and I will observe who are the sponsors and who are their cosponsors. Then I will reorganize the data obtained in a data frame to count the number of connections between them. To show the results of the analysis, I will create a network analysis through which observe the connection between the two parties. (2011)
In my research I will need a these of packages:
library(tidyverse)
library(jsonlite)
library(RCurl)
library(stringi)
library(rio)
library(plotly)
library(htmlwidgets)
library(igraph)
library(xml2)
library(here)FIRST RESEARCH QUESTION
DATA EXTRACTION
To research the data needed for the first question, I will use the API from the website ProPublica that takes its information directly from the pages of the two chambers of Congress. I will use the API to extract two pieces of information:
The complete list of Senators from the 117th Congress,
list_sen <- RCurl::getURL(senator,
httpheader = c(key))
cat(list_sen)
list1 <- fromJSON(list_sen)
list2 <- list1[[3]][[1]]
list3 <- list2[[5]]
sen_list <- as.data.frame(do.call(rbind, list3))The complete list of the votes cast by each senator during this Congress
i <- 1
for(i in 1:nrow(links1)){
result <- as.data.frame(RCurl::getURL(links1$`stri_paste(votes, 1:528, ".json")`,
httpheader = c(links1$key)))
Sys.sleep(1)
}
votes_list <- list()
i <- 1
for (i in 1:528){
vote_step <- fromJSON(result[[i]])
votes_list[[length(votes_list)+1]] <- vote_step
}
sen_vote <- list()
i <- 1
for (i in 1:528){
vote2 <- votes_list[[i]][[3]][[1]][[1]][["positions"]]
sen_vote[[length(sen_vote)+1]] <- vote2
}
off_vote <- list()
i <- 1
for (i in 1:length(sen_vote)){
list_step <- c(sen_vote[[i]])
off_vote[[length(off_vote)+1]] <- list_step
}
all_vote <- list()
i <- 1
for (i in 1:length(sen_vote)){
off_step <- as.data.frame(do.call(rbind, sen_vote[[i]]))
all_vote[[length(all_vote)+1]] <- off_step
}
data_vote <- as.data.frame(do.call(rbind, all_vote))To do these steps, I will use a for loop that will extract and add each of the 528 votes held last year to a data frame, just with two elements: my personal API key and the link of the API request (https://api.propublica.org/congress/v1/117/senate/members.json and https://api.propublica.org/congress/v1/117/senate/sessions/1/votes/) previously modified with all the numbers of roll call votes. I will use the function getURL from the package RCurl, and then, I will use the function fromJSON to work on the obtained file and to put all the elements collected from the API requests into a big list in which I will find and extract all the votes cast. After gaining a large list with all the elements needed, I will use rbind and do.call to transform all the elements of the list into a data frame.
DATA ANALYSIS
Once I obtain the data frame with all the positions I have to define the value of the vote cast by each senator with respect to his or her party Leader. Considering the party division in the Senate, I built a table in which I consider all the possible positions of the senators (vote yes, note no, and not vote) related to the position of the Party Leader. To create a useful data frame with the values, I have to construct three new data frames, one for the Republicans, one for the Democrats, and one for the Independents, using filter with the name of the leader and merge with the data frame that contains all the votes cast. In these two I will compare the vote of Mitch McConnell and Chuck Schumer just by using ifelse crossing the value previously defined with the column of the data frame that contains all the votes in order to create a vector with all the values for on hundred senators. These three vectors have to be inserted in the previous data frames as three new columns that I will merge by using is.na because one column is composed just of the votes of the Republicans and NA if the senator came from the other parties, one by the votes of Democrats, and one for the votes cast by the Independents, while all the missing values are composed by NA. A final point, starting from this new column merged, I created a new data frame by using summarize_at, group_by, and mean and that new one will include:
- The last name of each senator, taken from the previous data frame as a character
- The related mean of the value, created in the previous data frame, grouping, summarizing, and making the mean of the previously recoded value of the vote
- The party of each senator, by adding the column in the sen_list data frame as a character
- The year of the first election, add as a vector
As can be seen in the following table, Using these four elements, I will show in the following graph how strong is the loyalty of each senator to his or her leader with a dot plot.
#Mitch McConnell
leaderR <- data_vote %>%
filter(name == "Mitch McConnell")
date <- rep(c(1:526), each = 1)
leaderR$date <- date
offR = merge(data_vote, leaderR, by = "date")
#Chuck Schumer
leaderD <- data_vote %>%
filter(name == "Charles E. Schumer")
date <- rep(c(1:526), each = 1)
leaderD$date <- date
offD = merge(data_vote, leaderD, by = "date")
#merge
mean_r <- ifelse(offR$vote_position.y == "Yes" & offR$vote_position.x == "Yes" & offR$party.x == "R", 0.5,
ifelse(offR$vote_position.y == "Yes" & offR$vote_position.x == "No" & offR$party.x == "R", 0,
ifelse(offR$vote_position.y == "Yes" & offR$vote_position.x == "Not Voting" & offR$party.x == "R", 0.25,
ifelse(offR$vote_position.y == "No" & offR$vote_position.x == "Yes" & offR$party.x == "R", 1.5,
ifelse(offR$vote_position.y == "No" & offR$vote_position.x == "No" & offR$party.x == "R", 0.5,
ifelse(offR$vote_position.y == "No" & offR$vote_position.x == "Not Voting" & offR$party.x == "R", 1,
ifelse(offR$vote_position.y == "Not Voting" & offR$vote_position.x == "Yes" & offR$party.x == "R", 1,
ifelse(offR$vote_position.y == "Not Voting" & offR$vote_position.x == "No" & offR$party.x == "R", 0.25,
ifelse(offR$vote_position.y == "Not Voting" & offR$vote_position.x == "Not Voting" & offR$party.x == "R", 0.5, NA)))))))))
mean_d <- ifelse(offD$vote_position.y == "Yes" & offD$vote_position.x == "Yes" & offD$party.x == "D", 1.5,
ifelse(offD$vote_position.y == "Yes" & offD$vote_position.x == "No" & offD$party.x == "D", 2,
ifelse(offD$vote_position.y == "Yes" & offD$vote_position.x == "Not Voting" & offD$party.x == "D", 1.75,
ifelse(offD$vote_position.y == "No" & offD$vote_position.x == "Yes" & offD$party.x == "D", 0.5,
ifelse(offD$vote_position.y == "No" & offD$vote_position.x == "No" & offD$party.x == "D", 1.5,
ifelse(offD$vote_position.y == "No" & offD$vote_position.x == "Not Voting" & offD$party.x == "D", 1,
ifelse(offD$vote_position.y == "Not Voting" & offD$vote_position.x == "Yes" & offD$party.x == "D", 1,
ifelse(offD$vote_position.y == "Not Voting" & offD$vote_position.x == "No" & offD$party.x == "D", 1.75,
ifelse(offD$vote_position.y == "Not Voting" & offD$vote_position.x == "Not Voting" & offD$party.x == "D", 1.5, NA)))))))))
mean_id <- ifelse(offD$vote_position.y == "Yes" & offD$vote_position.x == "Yes" & offD$party.x == "ID", 1.5,
ifelse(offD$vote_position.y == "Yes" & offD$vote_position.x == "No" & offD$party.x == "ID", 2,
ifelse(offD$vote_position.y == "Yes" & offD$vote_position.x == "Not Voting" & offD$party.x == "ID", 1.75,
ifelse(offD$vote_position.y == "No" & offD$vote_position.x == "Yes" & offD$party.x == "ID", 0.5,
ifelse(offD$vote_position.y == "No" & offD$vote_position.x == "No" & offD$party.x == "ID", 1.5,
ifelse(offD$vote_position.y == "No" & offD$vote_position.x == "Not Voting" & offD$party.x == "ID", 1,
ifelse(offD$vote_position.y == "Not Voting" & offD$vote_position.x == "Yes" & offD$party.x == "ID", 1,
ifelse(offD$vote_position.y == "Not Voting" & offD$vote_position.x == "No" & offD$party.x == "ID", 1.75,
ifelse(offD$vote_position.y == "Not Voting" & offD$vote_position.x == "Not Voting" & offD$party.x == "ID", 1.5, NA)))))))))
data_vote$mean_r <- mean_r
data_vote$mean_d <- mean_d
data_vote$mean_id <- mean_id
data_vote$merged[is.na(data_vote$mean_r)] <- data_vote$mean_d[is.na(data_vote$mean_r)]
data_vote$merged[is.na(data_vote$merged)] <- data_vote$mean_id[is.na(data_vote$merged)]
mean <- data.frame(summarise_at(group_by(data_vote, name),vars(merged),funs(mean(.,na.rm=TRUE))))
mean$office <- assumed_office
mean$party <- as.character(sen_list$party)
mean$name <- as.character(mean$name)
colnames(mean)[colnames(mean) == "merged"] = "loyalty"LOYALTY PLOT
The plot of the loyalty will be a simple dot plot: it will be based on the value obtained in the previous data frame and will show the differences in the mean of each senator concerning the value of the leader. On the x-axis, I will place the variable assumed_office that goes from 1975 to 2021 without any regularity. On the other hand, the y-axis is the variable loyalty and it is contained in a range between 0 and 2 and depends on the mean previously calculated. The dots in the plot are different, just for two main reasons:
- The different shape indicates the position in the US Senate. In fact Leader, Whip, and Conference Chair are the three main figures that lead the party caucus
- The different size is needed just because of better visualization of the graph
Thanks to these simple accouterments, the graph shows the distribution of the senators with respect to the party leaders that they elect at the beginning of each Congress. While in the Republican party each senator seems to vote quite differently with respect to the leaders, the Democrats seem to be more united, and this could be explained with the political division of the Senate that make the Democrats in the position of searching votes inside the opposition to pass a large part of their legislation. (2007)
p <- ggplot(mean, aes(x = office, y = loyalty, color = party, label = name)) +
geom_point(aes(shape = role, size = role_v)) +
scale_color_manual(values = c("blue", "green", "red")) +
scale_y_discrete()
ggplotly(p)SECOND RESEARCH QUESTION
DATA EXTRACTION
To request the information of the second research question, due to a technical problem with the API of Pro Publica, I switched sources and downloaded a folder directly from YouGov that contains 3687 XML with the text of all the introduced bills. With all these files in a folder on my computer, I just need to scrape the information, and to do the extraction I will use for loop with read_xml using the name of the files in the folder, create with str_c in the same loop, and then xml_find_all to extract the specified nodes that I need. In order to gather the full list of sponsors and cosponsors, I will use these functions two times, one to scrape the first list that contains one element for each bill and one to scrape the second list that has a different length for each bill, depending on the number of senators who have decided to sign the bill. As in the previous question, it will render me two lists with a lot of elements that I will reorganize into two data frames different using stri_list2matrix and as.data.frame adding them the number of the laws that will be useful in the following step.
spon_list <- list()
i <- 1
for (i in 1:3687){
path <- str_c("1 (", i, ").xml")
a <- read_xml(path)
cosp <- c(xml_find_all(a, ".//form/action/action-desc/sponsor") %>%
xml_text())
spon_list[[length(spon_list)+1]] <- cosp
}
cosp_list <- list()
z <- 1
for (z in 1:3687){
path <- str_c("1 (", z, ").xml")
a <- read_xml(path)
cosp <- c(xml_find_all(a, ".//form/action/action-desc/cosponsor") %>%
xml_text())
cosp_list[[length(cosp_list)+1]] <- cosp
}
spon_data <- as.data.frame(t(stri_list2matrix(spon_list)))
cosp_data <- as.data.frame(t(stri_list2matrix(cosp_list)))
number <- (1:3687)
cosp_data$number <- number
spon_data$number <- numberDATA ANALYSIS
From the XML files downloaded, I obtain two data frames:
- The first one contains the full list of the sponsors of each introduced bill, which contains the full list of the 3687 sponsors of all the introduced bills
- The second one contains the full list of the cosponsors of each introduced bill, that have almost 59 columns for each cosponsor with a lot of NA if there are less than 59 cosponsors
Using these two data frames the main objective is to create a new table in which I will have three columns: one for the sponsors, one for their cosponsors, and one with the number of relationships they had during all the first session of this Congress. To complete this task, I first need to use select_if and pivot_longer to create the data frame in which I have each sponsor related to its cosponsor and to the number of bills. Then, I have to use group_by and summarize to reduce the number of connections and create a new variable that shows the number of connections between sponsor and cosponsor. Observing this data frame I see a lot of missing values and errors, so I need to remove them by using filter with !is.na while I fix the error in the data using str_replace deleting the presence of multiple spaces and the title of each senator (Mr., Ms., and Mrs.). To the obtained data frame I just need to modify the order of the columns, placing the list of cosponsors in the first place using relocate because they are the starting point of the following of the network analysis. Starting from this data frame I have the possibility to create a network analysis over the connection to research the possible polarization of the US upper chamber of Congress. The following work over the data frame will be useful to the construction of the network analysis.
cosp_data$sponsor <- spon_data$V1
att <- cosp_data %>%
select_if(is.character) %>%
pivot_longer(!sponsor, names_to = "name", values_to = "cosponsor")
count <- att %>%
group_by(sponsor, cosponsor) %>%
summarize(n = n())
count <- count %>%
filter(!is.na(cosponsor)&!is.na(sponsor))
count$sponsor <- str_replace(count$sponsor, "M[rs][s]*\\.[\\s]*", "")
count$cosponsor <- str_replace(count$cosponsor, "M[rs][s]*\\.[\\s]*", "")
count$sponsor <- str_replace(count$sponsor, "Wyden[\\s]*", "Wyden")
count$cosponsor <- str_replace(count$cosponsor, "Wyden[\\s]*", "Wyden")
count$sponsor <- str_replace(count$sponsor, "Ernst[\\s]*", "Ernst")
count$cosponsor <- str_replace(count$cosponsor, "Ernst[\\s]*", "Ernst")
count[count == "Van [s]Hollen"] <- "Van Hollen"
count <- count %>%
relocate(cosponsor)CONCLUSION
Through the two questions, I have the possibility to observe loyalty and social connection between senators. As far as loyalty, it is possible to observe a greater distribution between the GOP that could be explained by the political division of the Senate: in fact, the Dems control just half of the Senate, and to pass a large part of their agenda they need the support of almost some senators of the opposition. On the other hand, the smaller distribution of the Dems is needed to pass almost some part of their agenda and it requires all the ability of the leader to keep control of his caucus.
With regard to the social connection, observing the graph it is possible to see that the well-known polarization cannot be found in the Senate because each senator has done almost one law with the other part. Also, the senators who were described as more progressist and the others described as more conservators had to try to push some piece of legislation together. In the end, it is visible a sort of division on one side the Dems and on the other the GOP, while in the middle there are a lot of centrist Senators that try to work with the two sides.
For this project, I used a lot of different packages and data extracted from ProPublica and GovInfo with the objective of the study of the sitting US Senators. I try to find an answer to my two research questions with recoding and graph that permit me to make an important analysis of the Senate.
SOCIAL NETWORK ANALYSIS
Starting from the data frame named count it is possible to create a social network that will show if there is a polarization in the Senate. Firstly, I start using the function merge between this data frame and the list of the senators to add the party to each sponsor and cosponsor. Then I have to prepare edges and nodes to be useful to the construction of the network. By using ifelse I create a new variable that will permit to use of different colors for the edges. Then, I have to prepare the vertices in a data frame, by adding the name, the size, and the party of each one. Using the function graph_from_data_frame I can create the basis of the network starting from the previously created data frame just adding vertices and direction. As a following passage, I define the color of each edges considering the variable fromto previously defined within the data frame count2 and adding to the network using E(network)$color. The last thing need is the layout and I decided to use Fruchterman-Reingold which seems to be the better choice for the value that I have. In the end, I put all these elements inside the plot to show the results of my social network analysis adding the title and the legend that consent to better visualization of the network. (2001)
Social Network Analysis